Gene Expression Profiling of DNA Microarray Data using Peano Count Trees (P-Trees)
نویسندگان
چکیده
The explosion of genomic data made possible by advances in parallel, high-throughput technologies in the area of molecular biology, has ushered in a new era in the area of Bioinformatics. During the last many years, efforts concentrated on sequencing the genome of organisms. Current emphasis lies in extracting meaningful information from this huge DNA sequence and expression data. The techniques currently employed to do analysis of microarray expression data is clustering and classification. These techniques present their own limitations as to the amount of useful information that can be derived. In this paper, we propose a new approach to data mining the microarray data using new data mining technology called Peano Count Tree (P-tree) 1 . This technology employs Association Rule Mining as means to do data mining of the microarray data. Association Rule Mining is an advanced data mining technique that is useful in deriving meaningful rules from a given data. We propose using Association Rule Mining to derive meaningful rules from microarray expression data. Our approach proposes a new microarray data mining technology, which involves a "Data Mining Ready" data structure, called Peano count tree (P-tree), to measure gene expression levels. The method involves treating the microarray data as spatial P-tree technology is patented to North Dakota State University Proc. of the Virt. Conf. in Genom. and Bioinf.© www.ndsu.edu/virtual-genomics North Dakota State University, USA Octuber, 15-16, 2001. data. Each spot on the microarray is presented as a pixel with corresponding red and green ratios. The microarray data is reorganized into an 8-bit bSQ file (where each attribute or band is stored as a separate file). Each bit is then converted in a quadrant base tree structure P-tree from which a data cube is constructed and meaningful rules readily obtained.
منابع مشابه
Peano Count Trees (P-Trees) and Rule Association Mining for Gene Expression Profiling of Microarray Data
9DOGLYLD * UDQGD : $ 3 HUUL] R : / DUV RQ ) DQG ' HF N DUG ( / ' HS DUWP HQW RI 3 ODQW 6F LHQF HV / RIWV J DUG + DOO 2 IILF H & 1 RUWK ' DN RWD 6WDWH 8 QLYHUV LW\ )DUJ R 1 ' 8 QLWHG 6WDWHV RI $ P HULF D ' HS DUWP HQW RI & RP S X WHU 6F LHQF H / RIWV J DUG + DOO 2 IILF H & 1 RUWK ' DN RWD 6WDWH 8 QLYHUV LW\ )DUJ R 1 ' 8 QLWHG 6WDWHV RI $ P HULF D ,QIRUP DWLRQ 7 HF KQRORJ \ 6HUYLF HV / RIWV J DUG...
متن کاملExpression Profiling of Microarray Gene Signatures in Acute and Chronic Myeloid Leukaemia in Human Bone Marrow
Background Classification of cancer subtypes by means of microarray signatures is becoming increasingly difficult to ignore as a potential to transform pathological diagnosis nonetheless, measurement of Indicator genes in routine practice appears to be arduous. In a preceding published study, we utilized real-time PCR measurement of Indicator genes in acute lymphoid leukaemia (ALL) and acute m...
متن کاملA microarray-based method for the parallel analysis of genotypes and expression profiles of wood-forming tissues in Eucalyptus grandis
BACKGROUND Fast-growing Eucalyptus grandis trees are one of the most efficient producers of wood in South Africa. The most serious problem affecting the quality and yield of solid wood products is the occurrence of end splitting in logs. Selection of E. grandis planting stock that exhibit preferred wood qualities is thus a priority of the South African forestry industry. We used microarray-base...
متن کاملMultimedia Data Mining Using P - Trees 1 , 2
The DataSURG group at NDSU has a long-standing interest in data mining remotely sensed imagery (RSI) for agricultural, forestry and other prediction and analysis applications. A spatial data structure, the Peano count tree, was developed that provided an efficient, lossless, data mining ready representation of the many types of data involved in these applications. This data structure has made p...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کامل